Learning with Taxonomies: Classifying Documents and Words
نویسندگان
چکیده
Automatically extracting semantic information about word meaning and document topic from text typically involves an extensive number of classes. Such classes may represent predefined word senses, topics or document categories and are often organized in a taxonomy. The latter encodes important information, which should be exploited in learning classifiers from labeled training data. To that extent, this paper presents an extension of multiclass Support Vector Machine learning which can incorporate prior knowledge about class relationships. The latter can be encoded in the form of class attributes, similarities between classes or even a kernel function defined over the set of classes. The paper also discusses how to specify and optimize meaningful loss functions based on the relative position of classes in the taxonomy. We include experimental results for text categorization and for word sense classification.
منابع مشابه
Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملOntology Based Machine Learning for Semantic Multiclass Classification
Following the development of semantic web technologies, many ontologies and thesauri have been proposed to index resources during the last decade. However, despite their expressiveness, those knowledge models do not always cover all the points of interest within dedicated applications. Therefore, alternative ad hoc taxonomies have been developed to support resources classifying processes. This ...
متن کاملUnsupervised Concept Hierarchy Induction: Learning the Semantics of Words
Unsupervised concept hierarchy induction, or taxonomy learning, is the task of hierarchically classifying word senses in order to develop a taxonomy of concepts. Taxonomies of concepts such as the one found in WordNet (Fellbaum, 1998) are important resources for a variety of Natural Language Processing (NLP) including word sense disambiguation (Ramakrishnan et al., 2004; Navigli & Velardi, 2004...
متن کاملFOLKSONOMY - SUPPLEMENTING RICHE EXPERT BASED TAXONOMY BY TERMS FROM ONLINE DOCUMENTS (Pilot Study)
RICHE (Research Inventory of Child Health in Europe) is a platform developed and funded under the Health domain of 7th European Framework Program. The platform search engine is expected to use the multilingual taxonomy of terms for processing and classifying large volumes of documents of the RICHE repository. So far the experts participating in this project have produced the initial version of ...
متن کامل